Automated Data Pre-processing via Meta-learning
نویسندگان
چکیده
A data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and nonexperienced users become overwhelmed. We show that this problem can be addressed by an automated approach, leveraging ideas from metalearning. Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.
منابع مشابه
Automated Data Cleansing through Meta-Learning
Data preprocessing or cleansing is one of the biggest hurdles in industry for developing successful machine learning applications. The process of data cleansing includes data imputation, feature normalization & selection, dimensionality reduction, and data balancing applications. Currently such preprocessing is manual. One approach for automating this process is meta-learning. In this paper we ...
متن کاملAutomated Detection of Multiple Sclerosis Lesions Using Texture-based Features and a Hybrid Classifier
Background: Multiple Sclerosis (MS) is the most frequent non-traumatic neurological disease capable of causing disability in young adults. Detection of MS lesions with magnetic resonance imaging (MRI) is the most common technique. However, manual interpretation of vast amounts of data is often tedious and error-prone. Furthermore, changes in lesions are often subtle and extremely unrepresentati...
متن کاملThe Effect of Teaching Meta-cognition Package on Self-Directed Learning in Medical Records Students of Isfahan University of Medical Sciences
Introduction: The ongoing rapid changes in science bring about the need of higher education to indepen-dent and self-directed learners. This study seeks the effect of meta-cognition package training on self-directed learning in medical records students. Methods: In this quasi-experimental study using two group design with pre-test and post-test, 24 female and male medical records students were...
متن کاملA Monte Carlo-Based Search Strategy for Dimensionality Reduction in Performance Tuning Parameters
Redundant and irrelevant features in high dimensional data increase the complexity in underlying mathematical models. It is necessary to conduct pre-processing steps that search for the most relevant features in order to reduce the dimensionality of the data. This study made use of a meta-heuristic search approach which uses lightweight random simulations to balance between the exploitation of ...
متن کاملDiagnostic predictors of learning readiness for pre-school children: A meta-analytic review
Background and Objectives: Students with learning difficulties encounter poorer school outcomes and major problems in learning. Researchers investigated the factors in preschool stage that will help to diagnosis learning problems. Methods: A meta-analytic review provides means for assessing which factors show the strongest effects on long-term outcomes. Results: This article presents a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016